Auditory-based automatic speech recognition

نویسندگان

  • Werner Hemmert
  • Marcus Holmberg
  • David Gelbart
چکیده

In this paper we develop a physiologically motivated model of peripheral auditory processing and evaluate how the different processing steps influence automatic speech recognition in noise. The model features large dynamic compression (>60 dB) and a realistic sensory cell model. The compression range was well matched to the limited dynamic range of the sensory cells and the model yielded surprisingly high recognition scores. We also developed a computationally efficient simplified model of auditory processing and found that a model of adaptation could improve recognition accuracy. Adaptation is a basic principle of neuronal processing, which accentuates signal onsets. Applying this adaptation model to melfrequency cepstral coefficient (MFCC) feature extraction enhanced recognition accuracy in noise (AURORA 2 task, averaged recognition scores) from 56.4% to 75.6% (clean training condition), a relative improvement of 41% in word error rate. Adaptation outperformed RASTA processing by more than 10%, which corresponds to a relative improvement of 31%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying physiologically-motivated models of auditory processing to automatic speech recognition

For many years the human auditory system has been an inspiration for developers of automatic speech recognition systems because of its ability to interpret speech accurately in a wide variety of difficult acoustical environments. This paper discusses the application of physiologically-motivated approaches to signal processing that facilitate robust automatic speech recognition in environments w...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Auditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments

In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition ...

متن کامل

مدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی

In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...

متن کامل

Correlation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants

Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004